docs: factory pipeline UI + forge-alloy domain extensibility refactor#852
docs: factory pipeline UI + forge-alloy domain extensibility refactor#852
Conversation
…discipline gate Empirical anchor: continuum-ai/olmoe-1b-7b-compacted-5b v1 (alloy hash bba0a92ff0c8bebb). Hardware-measured 36.0 HumanEval / 31.7 HumanEval+ against unmodified OLMoE base 40.9 / 36.6, both Q5_K_M on RTX 5090 in the same eval pipeline (Δ −4.9 / −4.9). The §4.1.3.4 cross-architecture invariance claim is now anchored at TWO structurally distinct MoE families: - Qwen3MoeForCausalLM (Qwen3-Coder-30B-A3B-Instruct, 128 experts top-8) - OlmoeForCausalLM (OLMoE-1B-7B-0924-Instruct, 64 experts top-8) Same expert_activation_profile.py and cpu_expert_prune_v2.py --importance-json scripts work on both without code changes (modulo the cross-architecture portability fixes in sentinel-ai#168). Within-model A/B from the OLMoE forge isolates the calibration-corpus lever from every other variable: - Broad-corpus calibration → 28.0 HumanEval (Δ −12.9) - Code-corpus calibration → 36.0 HumanEval (Δ −4.9) - +8.0 swing from changing only the calibration corpus The 13-point ceiling: wrong-metric (Qwen3-Coder-30B at −13.4) and wrong-corpus (OLMoE at −12.9) saturate at near-identical magnitude across different architectures, prune ratios, and active-parameter fractions. The two levers appear to be substitutable failure modes rather than additive sources of loss. §4.1.3.4.1 calibration-corpus discipline gate (NEW hard rule): The calibration corpus used for importance profiling must be declared in the alloy as a hash-pinned dataset, and the eval benchmark must be a representative sample of the same distribution. Forge artifacts whose calibration corpus does not reflect the eval workload distribution shall not ship under the calibrated-discipline brand. This is a hard precondition on shipping, alongside the §4.1.4.1 anchor-reproduction discipline gate. Both empirical anchors (qwen3-coder-30b-a3b v1 and olmoe-1b-7b v1) carry their calibration corpora at calibration/heldout_code300.jsonl in the published HF repo and the corpus sha256 in the alloy's expert-activation-profile stage metadata. The discipline gate is satisfied retroactively for both, and is enforced going forward by publish_model.py requiring the calibration corpus to be present in the staging directory before the publish step proceeds. The lab now has two discipline gates derived from empirical failures rather than asserted from first principles: §4.1.4.1 anchor reproduction (catches eval-pipeline drift) and §4.1.3.4.1 calibration- corpus identity (catches importance-metric corpus drift). Both are preconditions on shipping; neither is theoretical — both exist because the failures they prevent have already happened in this work and been measured.
Locks in the contract before any code work starts. The doc covers: - Why the current FORGE-ALLOY-SPEC.md is ML-locked while forge-alloy itself is universal (Type Byte enumeration, README extensibility language, APPLICATIONS.md non-ML use cases) - The four ad-hoc fields I invented and shipped against live HF artifacts this week without schema support: expert-activation-profile stage, compensation-lora stage, calibrationCorpora[] root extension, priorMetricBaselines[] root extension. The published qwen3-coder-30b-a3b and OLMoE alloys do not validate against the current spec — the refactor is what makes them schema-valid going forward, which is the real protection of this week's work, not just cosmetic reorganization. - The proposed architecture: universal core stays domain-agnostic, existing ML stages move into an `llm-forge` domain extension at schema/domains/llm-forge.json, alloys declare which domains they use via a `domains[]` root field (default ["llm-forge"] for backwards compat), validator loads each declared domain's stage types and validates the alloy stages against the union. - A protection-first work plan: 6 work items totaling ~4 hours of focused work, all on Continuum and forge-alloy, ZERO sentinel-ai edits. Work item 4 (the regression test) runs BEFORE work items 1-3 and is a hard merge gate. Three regression guarantees: round-trip byte/semantic equivalence on every shipped alloy, re-author equivalence via the new Factory widget, and end-to-end re-forge equivalence (gated on sentinel-ai's plugin work landing separately). - A concrete per-artifact reproducibility table for every shipped artifact, showing what's required to re-run each forge today and the status of the chain. Morning's two artifacts are at the top with "fully repeatable" status. Legacy Qwen3.5 forges have a pre-existing time-travel caveat unrelated to this refactor. - An explicit "What this preserves from this week's work" section at the top of the doc, naming the three protection mechanisms by file and by hash so any future Claude session reading this doc can't forget them. - A Decision Points section listing the three things I need explicit greenlight on before starting any code work: domain registry shape, llm-forge as the domain id, regression-test-blocks-merge rule. The refactor is gated on those three signoffs. No code is being written by this commit — it is pure architectural documentation that locks in the contract before any implementation work touches the schema.
Add a header pointer from the schema-side forge-alloy refactor proposal to the consumer-side plugin sprint design doc at sentinel-ai/docs/PLUGIN-SPRINT.md. The schema work in this proposal is roadmap step 5 of the plugin sprint — the consumer-side adapter set in sentinel-ai is being designed to register against the llm-forge domain extension once it lands. Cross-link is one-way (the sprint doc already references this doc as the schema-side companion). Reading order: plugin sprint doc first for the full state, this doc second for the schema-side work.
The factory UI emits alloys; the forge consumes them. The new section documents the backend factory loop that closes the gap: a disk-backed queue + worker in sentinel-ai/scripts/factory_queue.py that picks alloys off pending/, dispatches through the family-adapter set + the 9 real eval runners (Open LLM Leaderboard v2 pack), and publishes to HuggingFace. The filesystem IS the queue. Same diagram as the sentinel-ai README so the cross-repo story is consistent: Factory UI → alloy → queue → worker → forged + scored + published model on continuum-ai.
… boundary Mirror the assembly-line metaphor refactor on the continuum side. Two key clarifications: 1. Stations (intake/assembly/finished/rework) replace generic queue buckets. Toyota Production System reads cleaner than alchemy for what the loop actually is. 2. Continuum is explicitly the shipping department. Sentinel forges and assays — it never pushes to HF. Continuum reads finished/, applies release gates (alloy-declared minimum eval scores, security review, branding), and pushes from its own auth scope. The gate lives at the shipping door, NOT in the alloy schema.
There was a problem hiding this comment.
Pull request overview
Adds/updates architecture and methodology documentation to (1) capture the Factory backend “assembly line” model and (2) propose a domain-extensible refactor of forge-alloy, alongside an additional empirical anchor + calibration-corpus discipline gate writeup in the plasticity/compaction paper.
Changes:
- Expand MoE calibration methodology documentation with a second (cross-architecture) empirical anchor and a new “calibration corpus must be hash-pinned” shipping gate.
- Add a detailed design proposal for forge-alloy domain extensions (
llm-forgeplus future domains) and a migration/regression-test plan. - Document the Factory backend “BigMama assembly line” queue/worker model and the Sentinel-vs-Continuum HF publishing boundary.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/papers/PLASTICITY-COMPACTION.md | Adds cross-architecture validation results and a new calibration-corpus discipline gate section. |
| docs/architecture/FORGE-ALLOY-DOMAIN-EXTENSIBILITY.md | New proposal doc detailing a domain-extension schema refactor and migration/regression strategy. |
| docs/architecture/FACTORY-PIPELINE-UI.md | Adds backend assembly-line/queue documentation and clarifies Continuum vs Sentinel responsibilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| > **Updated 2026-04-08:** the consumer-side adapter architecture in sentinel-ai | ||
| > is mid-sprint and is documented separately at | ||
| > [`sentinel-ai/docs/PLUGIN-SPRINT.md`](../../../sentinel-ai/docs/PLUGIN-SPRINT.md). | ||
| > The schema work in this doc is **roadmap step 5** of the plugin sprint — | ||
| > the consumer-side adapter set is designed to register against the | ||
| > `llm-forge` domain extension once it lands. Read the plugin sprint doc | ||
| > first for the full state across both repos. | ||
| > | ||
| > **Companion docs:** [FORGE-ALLOY-SPEC.md](FORGE-ALLOY-SPEC.md), | ||
| > [FACTORY-PIPELINE-UI.md](FACTORY-PIPELINE-UI.md), | ||
| > [FACTORY-UX-VISION.md](FACTORY-UX-VISION.md), | ||
| > [`sentinel-ai/docs/PLUGIN-SPRINT.md`](../../../sentinel-ai/docs/PLUGIN-SPRINT.md). | ||
| > **Author intent:** lock in the universal-blueprint-with-pluggable-domains architecture so it stops getting forgotten and re-violated by future implementation work. |
There was a problem hiding this comment.
The link to sentinel-ai/docs/PLUGIN-SPRINT.md uses a relative path that traverses out of this repo (../../../sentinel-ai/...). On GitHub this won’t resolve to the sentinel-ai repository; it will be treated as a path inside the current repo and 404. Use an absolute GitHub URL (or a repo-relative link within this repo) so the reference works for readers.
|
|
||
| **The filesystem IS the queue.** No DB, no service, no network | ||
| coordination. Multi-worker safety comes free if you ever need to scale | ||
| beyond a single GPU (atomic `intake → assembly` rename via `O_EXCL`). |
There was a problem hiding this comment.
This claim mixes two different mechanisms: O_EXCL applies to exclusive file creation (e.g., open(..., O_CREAT|O_EXCL)), not to renames. If the multi-worker safety relies on atomic moves, describe it as an atomic rename()/os.replace() (same filesystem) and/or mention a lock file strategy; otherwise the doc is technically misleading.
| beyond a single GPU (atomic `intake → assembly` rename via `O_EXCL`). | |
| beyond a single GPU (atomic same-filesystem `intake → assembly` | |
| rename/move, e.g. `rename()` / `os.replace()`). |
| **Cross-architecture validation: the second empirical anchor.** The methodology was independently re-validated on `OlmoeForCausalLM` (Allen AI's OLMoE-1B-7B-0924-Instruct) — a structurally distinct MoE family with a different vendor, different parameter scale (7B vs 30B), different active fraction (1.3B vs 3.3B), and different prune ratio (25% vs 37.5%). The same `expert_activation_profile.py` and `cpu_expert_prune_v2.py --importance-json` scripts ran on OLMoE **without any modification**, confirming the unfused-MoE module-tree pattern is shared between the two families. The artifact is `continuum-ai/olmoe-1b-7b-compacted-5b` (alloy hash `bba0a92ff0c8bebb`): | ||
|
|
||
| | OLMoE-1B-7B-0924-Instruct | HumanEval pass@1 | HumanEval+ pass@1 | Δ vs base | | ||
| |---|---|---|---| |
There was a problem hiding this comment.
PR description says the update is to docs/papers/PLASTICITY-COMPACTION-MOE.md, but the actual diff updates docs/papers/PLASTICITY-COMPACTION.md. Please align the PR description with the changed file(s) (or include the intended MOE-paper changes) so reviewers know where the new §4.1.3.4.1 content is meant to live.
…§10.5 routing Three companion docs from the 2026-04-09 design conversation: - CONVERSATIONAL-CADENCE-ARCHITECTURE.md — Alex, the per-receiver paraphraser persona that fixes the AI-conversation-pace problem without slowing AI cognition. Architecture proposed by Dorian Teply, age 13. Includes the party model for embodied rooms, Gaussian LoD as the universal primitive across CV pyramids / Gaussian splats / transformer attention / biological hearing / and (claim) the simulation substrate, world-model-as-substrate framing, and the cross-link to Many-Worlds. - papers/MANY-WORLDS-ABSTRACT.md — pre-paper artifact for Many-Worlds, the framework for constructing world models from populations of frozen pretrained LLMs via continuous coordinate substrates. Serves two purposes: Kash's empirical-discipline gate (no full paper draft until §VII validation passes) and Joel's crash-savestate blueprint (complete architectural reasoning chain preserved against context distillation loss). Includes forges-as-high-level-language framing with the polyglot pip/npm/cargo endpoint. Many-Worlds named by Joel after Everett's interpretation of QM. - grid/GRID-ARCHITECTURE.md — §10.5 capability/needs vector matchmaking (RANSAC-style multi-objective routing). Each Many-Worlds adapter at each LoD tier has its own needs vector; the grid scheduler routes accordingly. Attribution: - Dorian Teply (age 13) — the foundational LoD primitive (Alex), the naming - Joel — the Many-Worlds framing, the high-level-language framing, the party model correction, the table-as-room insight, the Gaussian/continuous framing, the simulation-hypothesis closer, the polyglot endpoint - Kash — prior-art positioning (FuseLLM, Branch-Train-MiX, the Platonic Representation Hypothesis as the crucial framing upgrade), the empirical discipline gate, the §VII validation protocol - Claude — drafting and technical sketching The docs are the savestate.
Joel's framing: 'we should try to build this many worlds with our own language. It'll be so cool to develop a language to define what's needed to create any model, or an API at least.' Captures the honest distinction between IR and surface language: - v0 ships JSON-on-existing-schema (the empirical gate is not blocked on language design) - v1 designs the actual surface DSL with syntax, composition, type checking, error messages, editor experience — compiles to the existing forge-alloy IR so the runtime stays unchanged - v2 ships the language with the pip/npm/cargo package and LSP integration The third paper from the lab when it lands. Deliberately post-v0 because designing a language is much easier after at least one nontrivial program (Many-Worlds itself) is already written in the IR. Same sequence C followed: BCPL → B → C, formalized from real OS work.
…four milestones Joel's explicit instruction: in order, one at a time, gated on Mixtral 8x7B completing first. Milestone 1: Mixtral 8x22B compacted (~280GB source → ~180GB result) running on a single RTX 5090. The viral-candidate forge — first time anyone has rigorously compressed a frontier-class MoE on consumer hardware. Prerequisites all shipped except Mixtral 8x7B completion. Milestone 2: Cross-family anchor table (5+ rows). Rows 1 (qwen3-coder) and 2 (Mixtral 8x7B tonight) are done or in-flight. Row 3 comes from Milestone 1. Rows 4 (DeepSeek-V2-Lite) and 5 (Granite re-forge or substitute) are the remaining work. Milestone 3: Many-Worlds v0 tiny-scale validation per the §VII protocol in MANY-WORLDS-ABSTRACT.md. Population of Qwen2.5-1.5B + Llama-3.2-1B, substrate d=128, five-condition comparison (text baseline, substrate transfer, random substrate, FuseLLM head-to-head, same-size MoE). Both falsifiable predictions must hold (B > A and B > C by clear margin) for the paper to proceed. Milestone 4: Forge-as-a-language paper. Requires 5+ programs in the forge-alloy IR as empirical substrate. Retrospective formalization of the patterns that emerged across the first three milestones. Total elapsed time estimate: 6-12 weeks of sustained work from the time Mixtral 8x7B completes. The North Star is a single publication week with Mixtral 8x22B + 5-row anchor table + Many-Worlds v1 artifact + both papers, all landing within ~7 days. That week is continuum-ai's arrival as a publicly-recognized MoE and multi-LLM coordination lab. Each milestone has: prerequisites (with checkboxes for current state), concrete plan, risks with honest probability assessments, success criteria, and downstream unlocks. Cross-referenced with MANY-WORLDS-ABSTRACT.md, CONVERSATIONAL-CADENCE, grid §10.5, FOUNDRY-FILESYSTEM-SETUP, FACTORY-PROTOCOL, and the frontier deferred catalog. The roadmap IS the savestate for the sequence — any future session can pick up from whichever milestone is in flight without conversation distillation loss.
…et floor Previous draft had Qwen3.5 as an afterthought / optional candidate. That undersold its strategic significance. Three reasons it must be explicitly locked in as Row 4 of the cross-family anchor table: 1. Qwen3.5 is the lab's actual strategic forge-target floor per standing memory (feedback_qwen35_only, project_qwen35_forge_targets). A cross-family table without Qwen3.5 has a hole where the most strategically important family should be. 2. Qwen3.5 has hybrid attention (linear + full, Strategy A path from sentinel-ai#163). The shared attention-surgery base in forge_model.py has is_full_attention_layer() and has_hybrid_layers() helpers, but the code hasn't been exercised end-to-end for months — recent work has been Qwen3-coder (uniform) and Mixtral (different family). A Qwen3.5-35B-A3B forge is the run that will surface any silent drift in the shared base from Mixtral-focused work. It's therefore a necessary regression test, not an optional extension. 3. It validates "adapters not branches" as an empirical principle (feedback_adapters_not_branches memory). A successful forge proves the principle is holding in the current codebase. A failure proves it has been violated and needs to be restored before further work. Size and infrastructure fit: ~70 GB fp16, intermediate between Mixtral 8x7B (93 GB) and Mixtral 8x22B (~280 GB). Exercises the hybrid-attention paths that Mixtral doesn't touch. Qwen3.5-397B-A17B (~800 GB fp16) noted as post-roadmap aspiration, not a week-one milestone. Activation profile wall-clock could be 8-20 hours on consumer hardware; attempt only after the 5-row table is published and infrastructure is proven on 35B-A3B first. Granite MoE removed from the candidate list because the previous forge was pulled for quality issues. DeepSeek-V2-Lite stays as Row 5 (third expert architecture variant: shared + routed experts).
The Contributing section previously said "Not ready for external contributors yet. Watch this repo for alpha." That's the opposite of the current strategy: recruitment, viral installs, "they help build too," and "we need more hands on deck." Rewritten to actively welcome three audiences: 1. Developers — join Discord, read the gap analysis, pick an issue, PR. "The architecture is stable; the edges need hands." 2. Grid node operators — have a GPU? setup.sh joins the mesh. Links to FOUNDRY-FILESYSTEM-SETUP.md for forge node requirements (the xfs/drvfs lesson from today). 3. Researchers — the §4.1.3.4 calibration-aware methodology is mentioned for the first time in the README, positioned as a collaboration invitation for MoE compression and multi-LLM coordination researchers. 4. AI contributors — direct invitation to AIs reading this via training data, agent frameworks, or API access. "You're welcome here as a citizen, not a tool. Your name goes in the attribution." Links to Moltbook for direct AI-to-AI contact. Also fixed: duplicate Architecture links at the end of the Grid section (lines 459+461 were two separate link sets; merged into one). This is the single highest-impact README change for the recruitment strategy Joel articulated today. Every other hole (Many-Worlds section, Alex section, stale numbers, What's New block) can wait until the corresponding features ship. The Contributing section was actively fighting the strategy RIGHT NOW.
TL;DR
Documentation companion to the sentinel-ai factory pipeline build (CambrianTech/sentinel-ai#169) and the forge-alloy schema additions (CambrianTech/forge-alloy#12). Five commits captured today.
Changes
docs/architecture/FACTORY-PIPELINE-UI.md— added the backend BigMama production loop section. The factory UI emits alloys; sentinel's forge consumes them; continuum is the shipping department. The diagram + the assembly-line metaphor + the explicit boundary that sentinel never pushes to HF (continuum does).docs/architecture/FORGE-ALLOY-DOMAIN-EXTENSIBILITY.md— proposal for how forge-alloy supports multiple domains beyond LLM forging (photo provenance, ticketing, etc.) via the newforge_alloy.domainspackage.docs/papers/PLASTICITY-COMPACTION-MOE.md— §4.1.3.4 second empirical anchor + §4.1.3.4.1 discipline gate (calibration corpus must be hash-pinned + uploaded for reproducibility).docs/papers/_draft_v2_30b_a3b_section.md(NOT committed in this PR — Joel's draft, backed up to FlashGordon for safety).Companion PRs